Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: Pa1121 Date: Jan 6, 2012 5:44pm
Forum: web Subject: Why is my robots.txt ignored and why is the historical pages still here?

In 2009 we added a robots.txt to our site in order to prevent this site/Alexa from crawling our website. It worked fine - until the other day. Now our entire site is back in the WayBack machine and even though it is stated several times on this site that adding the robots.txt file with the text

User-agent: ia_archiver
Disallow: /

should REMOVE all earlier historical documents from this site - all old historical pages is back all the way down to 2005. This means that the statement that this site does not want to add contents from sites that are not interested in being here - is just incorrect, not to mention the fact that having the robots.txt file for well over two years did not make Waybackmachine delete any of the earlier crawls, it seems as if they were just hidden since they are magically back now. They are all there, all back to 2005.

We sent an email the other day to infoATarchive.org to get our site totally removed - no reply at all and the site is still in the wayback history. We have re-uploaded the robots.txt to our root folder, but it seem to be totally ignored.



This post was modified by Pa1121 on 2012-01-06 10:15:44

This post was modified by Pa1121 on 2012-01-06 10:17:42

This post was modified by Pa1121 on 2012-01-07 01:44:49

Reply to this post
Reply [edit]

Poster: jory2 Date: Jan 6, 2012 9:23am
Forum: web Subject: Re: Why is my robots.txt ignored and why is the historical pages still here?

We were given the same treatment by this website and by the admin' responsible for the waybackmachine Chris Butler info@archive.org.
His reasons and excuses varied depending on the day.
At one point he explained that this website does not delete content they collect, another time it was due to the fact that the Internet Archive's archive-it project, which is a paid service, would suffer if they began deleting content.
You can always contact http://www.cybercrime.gov//ip.html
and explain your situation.

You may want to check this site as well:
http://archive.bibalex.org, the Internet archive at the New Library of Alexandria, Egypt, mirrors the Wayback Machine. Try your search there when you have trouble connecting to the Wayback servers.

Best of luck



This post was modified by jory2 on 2012-01-06 17:23:30

Reply to this post
Reply [edit]

Poster: Pa1121 Date: Jan 6, 2012 5:43pm
Forum: web Subject: Re: Why is my robots.txt ignored and why is the historical pages still here?

The issue has been solved to our full satisfaction.

Reply to this post
Reply [edit]

Poster: courtc2911 Date: Mar 13, 2012 2:29pm
Forum: web Subject: Re: Why is my robots.txt ignored and why is the historical pages still here?

We are having the same issues with this organization.

We have posted the robots.txt file on our server and emailed multiple times to all of the various contact email addresses on this site, with no response whatsoever or site removal.

What have any/all of you done to get your site/pages removed?

How have you handled this company, reported them to http://www.cybercrime.gov/

Thanks for the help!

Reply to this post
Reply [edit]

Poster: ka1 Date: Jan 18, 2012 3:52am
Forum: web Subject: Re: Why is my robots.txt ignored and why is the historical pages still here?

i have the same problem. after i gave them all the information the told me they need to delete my old site (its not possible for me to insert a robots.txt because its not active anymore) there was no reply and no action in 2 weeks. i contacted them again an again. nothing happens. is http://www.cybercrime.gov/ the only way?
what did you guys do to get a response from archive.org?

Reply to this post
Reply [edit]

Poster: courtc2911 Date: Mar 13, 2012 2:26pm
Forum: web Subject: Re: Why is my robots.txt ignored and why is the historical pages still here?

We are having the same issues with this organization.

We have posted the robots.txt file and emailed multiple times to all of the various contact email addresses on this site, with no response whatsoever.

What have any/all of you done to get your site/pages removed?

How have you handled this company, reported them to http://www.cybercrime.gov/

Thanks for the help!